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The exponential growth in the rate at which information can be communicated 
through an optical fiber is a key element in the so called information revolution. 
However, like all exponential growth laws, there are physical limits to be consid- 
ered. The nonlinear nature of the propagation of light in optical fiber has made 
these limits difficult to elucidate. Here we obtain basic insights into the limits 
to the information capacity of an optical fiber arising from these nonlinearities. 
The key simplification lies in relating the nonlinear channel to a linear channel 
with multiplicative noise, for which we are able to obtain analytical results. In 
fundamental distinction to the linear additive noise case, the capacity does not 
grow indefinitely with increasing signal power, but has a maximal value. The ideas 
presented here have broader implications for other nonlinear information channels, 
such as those involved in sensory transduction in neurobiology. These have been 
often examined using additive noise linear channel models, and as we show here, 
nonlinearities can change the picture qualitatively. 

The classical theory of communications d la Shannon was developed mostly in the con- 
text of linear channels with additive noise, which was adequate for electromagnetic propagation 
through wires and cables that have until recently been the main conduits for information flow. 
Fading channels or channels with multiplicative noise have been considered, for example in 
the context of wireless communications P|, although such channels remain theoretically less 
tractable than the additive noise channels. However, with the advent of optical fiber commu- 
nications we are faced with a nonlinear propagation channel that poses major challenges to 
our understanding. The difficulty resides in the fact that the input output relationship of an 
optical fiber channel is obtained by integrating a nonlinear partial differential equation and 
may not be represented by an instantaneous nonlinearity. Channels where the nonlinearities in 
the input output relationship are not instantaneous are in general ill understood, the optical 
fiber simply being a case of current relevance. The understanding of such nonlinear channels 
with memory are of fundamental interest, both because communication rates through optical 
fiber are increasing exponentially and we need to know where the limits are, and also because 
understanding such channels may give us insight elsewhere, such as into the design principles 
of neurobiological information channels at the sensory periphery. 

The capacity of a communication channel is the maximal rate at which information may be 
transferred through the channel without error. The capacity can be written as a product of two 
conceptually distinct quantities, the spectral bandwidth W and the maximal spectral efficiency 
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which we will denote C. In the classic capacity formula for the additive white Gaussian noise 
channel with an average power constraint, C = log(l + S/N) the spectral bandwidth W, 
which has dimensions of inverse time, multiplies the dimensionless maximal spectral efficiency 
C = log(l + S'/A^). Here S and N are the signal and noise powers respectively. It is instructive to 
examine this formula in the context of an optical fiber. Since the maximal spectral efficiency is 
logarithmic in the signal to noise ratio (SNR), it can never be too large in a realistic situation, so 
that the capacity is principally determined by the bandwidth W. In the case of an optical fiber, 
the intrinsic loss mechanisms of light propagating through silica fundamentally limits to a 
maximum of about 50THz H] corresponding to a wavelength range of about AOOnm (1.2 — 1.6/x). 
This is to be compared with current systems where the total bandwidth is limited to about 
15 THz. If the channel was linear, the maximal spectral efficiency would be C = log(l + S/N), 
S being input light intensity and N the intensity of amplified spontaneous emission noise in the 
system. An output SNR of say 100 (i.e. 20dB), would then yield a spectral efficiency of 6.6, 
which for a bOTHz channel would correspond to a capacity of 330 Tbit/sec. The channel, of 
course, is not linear; how do the nonlinearities impact the spectral efficiency of the fiber? The 
basic conclusion of the present work is that the impact is severe and qualitative. As shown in 
Fig.l, the effect is a saturation and eventual decline of spectral efficiency as a function of input 
signal power, in complete contrast with the linear channel case. We now proceed to motivate 
and discuss this result. 

It is widely recognised that nonlinearities impair the channel capacity. However, estimation 
of the impact of the nonlinearities on channel capacity has remained ad hoc from an information 
theory perspective. Here we obtain what appears to be the first systematic estimates (Fig.l) 
for the maximal spectral efficiency of an optical fiber channel as a function of the relevant 
parameters. In basic distinction to the linear channel, our considerations indicate that the 
maximal spectral efficiency does not grow indefinitely with signal power, but reaches a maximum 
of several bits and eventually declines, as illustrated in Figure 1. It is to be noted that current 
systems use a binary signalling scheme which limits the achievable spectral efficiency a priori to 
1 bit, and to reach the higher spectral efficiencies predicted by the theory, multi-bit signalling 
schemes would have to be used. Since the spectral efficiencies of current systems are already 
approaching 1 bit, it is clear that the limits discussed here will be of practical relevance in the 
future. 

Although a number of nonlinearities are present in light propagation in a fiber, we concen- 
trate on the most important one for fiber communications, namely the dependence of the refrac- 
tive index (and therefore the propagation velocity of light) on the light intensity, n = no + 712! ■ 
This nonlinearity is weak, but its effects accumulate due to the long propagation distances 
involved in fibre communications, and is responsible for the effects considered here. Three 
principle physical parameters characterising the propagation are of interest: the group veloc- 
ity dispersion j3 ~ lOps'^/km, the propagation loss a ~ 0.2 dB/km and the strength of the 
nonlinear refractive index, usually expressed in terms of the parameter 7 ~ 1/W/km. The 
propagation loss is compensated by interposing optical amplifiers into the system. Each am- 
plifier also injects spontaneous emission noise into the system with strength Ji = aGhuAu [|], 
with G being the amplifier gain, h the Planck's constant, u and Au being the centre frequency 
and frequency bandwidth of light respectively. Here 'a' is a numerical constant (which we as- 
sume to be 2). For spans of fiber interspersed with amplifiers that make the total channel 
gain unity, the effects of absorption may be accounted for simply by redefining the system 
length in terms of an effective length, Lg// ~ n^/a. If the nonlinearity were absent (7 = 0), we 
would have obtained, for the maximal spectral efficiency, Cq = log(l + 1 /In), I being the input 
power and = Ugli being the total additive noise power. Note that Cq declines logarithmically 
with system length, and would eventually vanish for infinitely long systems. Note also that 
although spectral efficiency is dimensionless, it is often written for convenience with the "units" 
bits/sec/Hz. 

For a variety of reasons, the principal one being limitations in the electronic bandwidth, it 
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is impractical to modulate the full optical bandwidth at once. Instead, current attempts towards 
achieving maximal information throughput involve so called Wavelength Division Multiplexing 
(WDM) 1^, where the whole optical bandwidth is broken up into disjoint frequency bands 
( "channels" ) each of which is modulated separately. We confine our attention to such systems 
(which from an information theory perspective corresponds to the "multi-user" case) 0, though 
we also comment on the ideal case of utilising the full optical bandwidth for a single data stream 
(the "single user" case). Quantitatively, the single user case is expected to have larger maximal 
spectral efficiencies, though we will argue that it shows the same qualitative behaviour as the 
multi-user case. The difference between the two reside in the fact that in the multi-user case, 
each channel is an independent information stream, and appears as an additional source of 
noise to every other channel due to nonlinear mixing. 

The nonlinear propagation effects in the evolution of the electric field amplitude involve 
a cubic term in the electric field. In a WDM system, the nonlinearities are classified by the 
field amplitudes participating in this cubic term for the evolution of the field amplitude of 
a given channel: self phase modulation denotes the case where all three fields belong to the 
same channel, cross phase modulation where two fields belong to a different channel and one 
to the same channel, and four wave mixing denotes the case where all three amplitudes belong 
to different channels. Out of these terms, four wave mixing gives rise to additive noise to 
the channel of interest and will not be considered further in this paper. One reason for this 
is that four wave mixing is strongly suppressed by dispersion when the channel spacings are 
substantial. Its effects can be accounted for by augmenting the additive noise term in the 
subsequent considerations. We also neglect self phase modulation effects, since these effects 
are deterministic for the given channel and in principle could be reduced by using nonlinear 
precompensation. Finally, we are left with cross phase modulation, which appears to be the 
principle source of nonlinear capacity impairment in the multiuser case for realistic parameter 
ranges. A further reason for our focus on cross phase modulation is that it gives rise to 
multiplicative noise, which gives rise to qualitatively new effects in the channel capacity. 

We model the propagation channel in the presence of cross phase modulation by means 
of a linear Schroedinger equation with a random potential fluctuating both in space and time. 
This is easily justifled starting from the nonlinear Schroedinger equation description commonly 
used to describe light propagation in single mode optical flbres 0]. Cross phase modulation 
arises from terms in the equation where the fleld intensity in the nonlinear refractive index is 
approximated by the sum of the fleld intensities in the channels other than the one for which the 
propagation is being studied. Therefore, if only cross phase modulation effects were retained, 
the propagation equation for the fleld amplitude in channel i then becomes 

td,E, = (^d^E, + V{z,t)E,, (1) 

where V{z,t) = -2-fJ2j^i\Ejiz,t)\^, the sum being taken over the other channels. Since 
independent streams of information are transmitted in the other channels, V{z, t) appears as 
a random noise term. Notice that the nonlinear propagation equation has now been reduced 
to a linear Schroedinger equation with a stochastic potential, so that the nonlinear channel 
has become a channel with multiplicative noise. We now need an adequate model for the 
stochastic properties of V{z,t). If the dispersion is substantial, we propose that V{z,t) may 
be approximated by a Gaussian stochastic process short range correlated in both space and 
time. Since V is obtained by adding a large number of different channels, each of which is 
short range correlated in time (r ~ 1/-B, where B is the channel bandwidth), we can expect V 
to have a correlation time of approximately 1/B. Dispersion causes the channels to travel at 
different speeds, thus causing V to be short range correlated in space as well, with a correlation 
length related to the dispersion length. Since ^ is a sum of intensities, it has nonzero mean, 
so we deflne 6V{z,t) = V{z,t) — {V), where {V) denotes the average value of V. Removing a 
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constant from the potential causes an overall phase shift independent of space and time, which 
is irrelevant to the present considerations. 

The parameter of interest in the following is the integrated strength of the fluctuating 
field, 7] = J dz{SV{z,0)SV{0,0)) . In order to estimate rj, we consider a simplified propagation 
model for the channels other than the one of interest, in which nonlinearities are neglected, 
and stochastic bit streams at the inputs to the channels are propagated forward with constant 
group velocities. The group velocity difference between two channels separated by a spacing 
AA is DAX. In this model with Uc other channels evenly spaced by AA around the channel of 
interest, each with intensity / and bandwidth B, we obtain r] — 2 ln(nc/2)(7/)^/ (BDAX). Here 
D is the dispersion parameter D = — 27rc/5/A^. Although this is a simplified model for the other 
channels, numerical simulations of propagation including the nonlinearities and dispersion for 
the side channels show that the estimate of rj is accurate. 

Note that the denominator in the expression of rj is the inverse of the dispersion length 
Ld for the given channel spacing. This form for r] follows from assuming that Lg// >> Ld, 
since in this limit the integral defining rj is cut off by L^. If on the other hand, Le// < Ld, the 
integral would be cut off by Leff, so that one would have to replace hj L^jf in the equation 
for T]. The fluctuation strength scales with the logarithm of the number of channels rather than 
the total number since channels at larger spacings are suppressed proportionately to channel 
spacing. This suppression due to dispersion leads to the logarithmic factor via a sum of the 
form E,- 1/(AA,) a E,- Vj- 

Within the model under consideration, the propagation down the fiber is given in terms 
of a propagator U{t,t';L) obtained by integrating the stochastic Schroedinger equation. For 
simplicity, we model the amplifier noise as an additive term with strength /„ as defined earlier. 
The channel is specified in terms of a relation between the input and output electric field 
amplitudes, Eoutit) = J dt'U{t,t'] L)Ein(t') +n(t). Since U is stochastic, due to the underlying 
stochasticity of V{z,t), the model corresponds to a channel with multiplicative noise. It is 
still intractable in terms of an exact capacity computation, but an analj^ic lower bound may 
now be obtained. This bound is based on the following information theoretic result (E.Telatar, 
private comniTinications): the capacity C of a channel with input X and outpiit Y related 
by a conditional distribution p{Y\X) and an input power constraint £'(||X|p) = P satisfies 
the inequalities C — maXpi^x)I{X,Y) > I{Xa,Y) > I{Xa,Yo) Here I{Xg,Y) is the mutual 
information when p{X) is chosen to be pg{X), a Gaussian satisfying the power constraint; 
I{Xg, Yg) is the mutual information of a pair {Xq, Yg) with the same second moments as the 
pair (X, y). The first inequahty is trivial since Pg{^) is not necessarily the optimal input 
distribution. A proof of the second inequality is outlined in the methods section. 

The quantity I{Xg, Yg) for the channel defined above may be computed from knowledge of 
the correlators {Eir,{t)E*^{t')) , {Eout{t)E*^^{t')) and {Eout{t)E*^{t')) . The first is defined a prion 
through the assumption of bandlimited Gaussian white noise input with a power constraint. The 
second follows from the first using the unitarity of U . The third correlator requires computation 
of the average propagator ([/), where the average is over realisations of V{z, t). For a Gaussian, 
delta-correlated V, we obtain {U{t,t';L)) = exp{—riL/2)Uo{t — t';L) (see methods), where Uq 
is the propagator for V = 0. Assembling these results, we finally obtain an analytic expression 
for a lower bound C^b to the channel capacity of the stochastic Schroedinger equation model: 

_(_L)2 

Clb = n,B ln(l + (2) 



where Iq is given by 



(3) 
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The fundamental departure from a linear channel in the above capacity expression is the 
appearance of an intensity scale Jq, which governs the onset of nonlinear effects. To obtain an 
idea about the value of /q, consider the parameter values B = AOGHz, D = 20ps/nm/km, AX = 
Inm, 7 = 1/W/km, Uc = 100, Le// ~ Ug/a = 100km. Then Iq = 32mW. Examination of Eq.^ 
shows that the intensity scale /q at which nonlinearities set in shows reasonable dependence 
on all relevant parameters, namely it increases with increases in the dispersion, the bandwidth 
and the channel spacing, but decreases with increasing system length and number of channels. 

The most striking feature of Eq.|^ is that instead of increasing logarithmically with signal 
intensity like in the linear case, the capacity estimate actually peaks and then declines beyond 
a certain input intensity. From Eq.^, it is easily derived that the maximum value is given 
approximately by Cmax ~ ^n^B ln(2/o//„), the maximum being achieved for an intensity Imax ~ 
(iQln/^Y^^ . The reason for this behaviour is that if we consider any particular channel, the 
signal in the other channels appear as noise in the channel of interest, due to the nonlinearities. 
This 'noise' power increases with the 'signal' strength, thus causing degradation of the capacity 
at large 'signal' strength. The behaviour of Eq.^] is graphically illustrated in Fig.l, where the 
spectral efficiency (bits transmitted per second per unit bandwidth) is shown as a function of 
input power. 

It is of interest to note that if the input intensity is kept fixed, the capacity bound declines 
exponentially with the system length. This is only to be expected, since the correlations of 
the electric field should decay exponentially due to the fiuctuating potential in the propagation 
equation. On the other hand, the maximal spectral efficiency given by Cmax declines only 
logarithmically in system length, in parallel with the behaviour for linear channels. It can 
therefore be inferred that if the input power was adjusted with system length instead of being 
kept fixed, the decline of spectral efficiency with system length will be logarithmic. 

Finally, we present qualitative arguments as to why the single user case is expected to 
show the same non-monotonicity of spectral efficiency with the input signal intensities. In the 
multi-user case, the noise power as effectively generated by cross phase modulation grows as 
since it involves three signal photons. In the single user case, the cubic nonlinearity is a 
deterministic process that does not necessarily degrade channel capacity. However, subleading 
processes which involve two signal and one spontaneous noise photon still scale superlinearly in 
signal intensity, as Therefore, one should still observe the same behaviour of the effective 
noise power overwhelming the signal at large signal intensities. Thus, we would still expect the 
spectral efficiency to decline at large input intensity, though not as rapidly in the multi-user 
(WDM) case. 

Methods 

Gaussian bound to the channel capacity 

Proof of the inequality I{Xg,Y) > I{Xg,Yg): define p{X,Y) as the product pg{X)p{Y\X), 
and Pg{X, Y) to be the joint Gaussian distribution having the same second moments as p{X, Y). 
Also define Pg{Y) to be the corresponding marginal of Pg{X, Y). 




(4) 



J dXdYp{X,Y)[log{ 



Pg{X)pg{Y) 



Pg{X,Y) 




Pg{X,Y) p^Y) 
p{X,Y) pg{Y) 



)] 
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Since p{X,Y) and pg{X,Y) share second moments, the first term on the RHS is I{Xg,Yg)- 
The second term may be simphfied using the convexity of the logarithm, (log(/)) < log((/)) 
to obtain 

I{Xg,Y) > I{Xg,Yg) -\og[ fdXdYpG{X,Y)^^] (7) 

> I{Xg,Yg) (8) 

The second inequahty follows by first performing the integral over X, and noting that 
log(/dFp(F))=log(l) = 0. 

Derivation of the average propagator (U): 

This can be done by resumming the perturbation series exactly for (U), for delta correlated 
V{z,t). Alternatively, in the path integral formalism 0, 

(f/(t, t'- L)) = Uo{t - t'; L) ((exp(^ T dzV{z, (9) 

Jo 

where the average is taken over V as well as over paths t{z) satisfying t{0) = t, t{L) = 
t' . The result in the paper follows by performing the Gaussian average over V . Since = 
dzV{z,t{z)) is a linear combination of Gaussian variables, it is also Gaussian distributed 
and satisfies (exp(i0)) = exp(— ((/)^)/2). The result follows by noting that for delta correlated 
V , {(fP') is a constant given by rjL. The delta correlations need to be treated carefully, this can 
be done by smearing the delta functions slightly and leads to the definition of 7] given earlier 
in the paper. 

References 

[1] Shannon, C. E. A mathematical theory of communications.. Bell Syst. Tech. J., 27, p. 379- 
423, p. 623-656 (1978). 

[2] Biglieri E., Proakis, J. & Shamai S., Fading channels: Information-theoretic and commu- 
nications aspects.. Information Theory Transactions 44:6 p. 2619-2692 (1998). 

[3] Glass, A.M. et ai, Advances in Fiber Optics. Bell Labs Technical Journal 5, p. 168 (2000). 

[4] Agrawal, G. P., Nonlinear Fiber Optics, Academic Press, Inc., San Diego, 1995. 

[5] Agrawal, G. P., Fiber-Optic Communication Systems, John Wiley & Sons, Inc., New York, 
1992, pp. 334. 

[6] Cover, T. M. & Thomas, J. A. Information Theory, John Wiley & Sons, Inc., New York, 
1991. 

[7] Feynman, R. P. & Hibbs, R. A., Quantum Mechanics and Path Integrals, McGraw-Hill, 
New York, 1965. 



6 



Acknowledgements 

We gratefully acknowledge discussions with E. Telatar, R. Sluslier, A. Chraplyvy, G. Foschini 
and other members of the fiber capacity modelling group at Bell Laboratories. We would also 
hke to thank D. R. Hamann and R. Slusher for careful readings of the manuscript. 

Figure Captions 

Figure 1. The curves in Fig.l represent lower bounds to the spectral efficiency for a homoge- 
neous length of fiber for a multi-user WDM system, given analytically by Eq.2. Although 
the curves represent lower bounds, we argue in the text that the true capacity shows 
the same qualitative non-monotonic behaviour with respect to input signal powers. The 
spectral efficiencies displayed in the figure correspond to the capacity per unit bandwidth, 
C = C /{ndv). Here 5u includes both the channel bandwidths and the inter-channel spac- 
ing. The parameters used for the figure are tt-c = 100, Lg// = lOOfcm, D = 20ps/nm/km, 
5v = 1.5B where B = lOGHz is the individual channel width. The two continuous 
curves correspond to 7 = l/W/km and 7 = O.l/W/km, the lower curve corresponding to 
7 = 1. The spontaneous noise strength is computed from the formula = aGhvB as 
explained in the text, with a = 2, G = 1000, v = 200THz. The dotted curve represents 
the spectral efficiencies of the corresponding linear channels given by 7 = 0. 
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